Environmental conditions and marine heatwaves influence blue whale foraging and reproductive effort

Abstract Animal behavior is motivated by the fundamental need to feed and reproduce, and these behaviors can be inferred from spatiotemporal variations in biological signals such as vocalizations. Yet, linking foraging and reproductive effort to environmental drivers can be challenging for wide‐ranging predator species. Blue whales are acoustically active marine predators that produce two distinct vocalizations: song and D calls. We examined environmental correlates of these vocalizations using continuous recordings from five hydrophones in the South Taranaki Bight region of Aotearoa New Zealand to investigate call behavior relative to ocean conditions and infer life history patterns. D calls were strongly correlated with oceanographic drivers of upwelling in spring and summer, indicating associations with foraging effort. In contrast, song displayed a highly seasonal pattern with peak intensity in fall, which aligned with the timing of conception inferred from whaling records. Finally, during a marine heatwave, reduced foraging (inferred from D calls) was followed by lower reproductive effort (inferred from song intensity).

. As global ocean temperatures increase, the frequency and intensity of anomalous warm water events known as marine heatwaves are on the rise as well (Frölicher et al., 2018;Oliver et al., 2018). Marine heatwaves can cause a range of ecological consequences, including changes in water column structure, primary production, species composition, marine life distribution and health, and fisheries management such as closures and quota changes (Oliver et al., 2018). The effects of these environmental changes on marine species' ability to successfully feed and reproduce therefore have critical consequences for their survival.
The distribution and habitat use patterns of marine predators often reflect dynamic ecological processes by integrating multiple trophic levels, and therefore their response to changing ocean conditions is tightly linked to shifts in their prey (Cotte et al., 2011;Croll et al., 1998;Nicol et al., 2000;Silber et al., 2017). Marine heatwaves can reduce foraging success in marine predators, which can in turn decrease reproduction and population health. During the North Pacific marine heatwave in 2013-2016, common murres died in record numbers and many breeding colonies of this fish-eating seabird experienced complete reproductive failure, presumably due to an ecosystem-wide scarcity of high-quality forage species (Piatt et al., 2020).
For far-ranging marine predators such as whales, measuring such changes is challenging. Yet a handful of recent studies have correlated decreased reproductive output with reduced foraging, by either measuring or modeling population demographic parameters. Recent diminishing prey resources appear to be insufficient to support reproduction for North Atlantic right whales, likely contributing to low calving rates (Gavrilchuk et al., 2021). Southern right whale reproductive success is correlated with global climate indices and the density of their prey (Seyboth et al., 2016). Reduced calving rates of gray whales are associated with decreased sea ice cover on their Arctic foraging grounds (Perryman et al., 2021). Blue whales in the North Pacific faced with reduced foraging opportunities due to changing environmental conditions are predicted to suffer population-level consequences in terms of reproductive success (Pirotta et al., 2019).
While evidence exists that foraging and reproductive success are inextricably linked in marine predators including baleen whales, examining relationships between environmental variation and reproductive output in terms of calf counts in subsequent years misses a key step: the more immediate influence of foraging success on subsequent reproductive effort within the same year. Monitoring biological signals can be an effective tool for overcoming this hurdle, particularly for sparsely distributed, wide-ranging species for which consistent observation across their range is difficult. Indeed, animals across taxonomic groups send and receive key information via biological signals (Dall et al., 2005), including social, alarm, or food-associated signals (Clay et al., 2012;Torney et al., 2011).
In the marine environment where light attenuates quickly, lowfrequency sound propagates efficiently over long distances (Au & Hastings, 2008). Therefore, many marine species including baleen whales rely on sound as a primary sensory modality, and communication via acoustic signaling is central to behaviors related to foraging and reproduction (Tyack & Clark, 2000).
Blue whales (Balaenoptera musculus) are a globally distributed, vocally active species. Here we focus on the pygmy blue whale (B. m. brevicauda) population that utilizes the South Taranaki Bight (STB) region between the North and South Islands of Aotearoa New Zealand (Barlow et al., 2018;Torres, 2013). This population is genetically distinct, with an estimated population size of 718 individuals (95% CI = 279-1926; Barlow et al., 2018). Unlike other blue whale populations with stereotypical migrations between low latitude breeding grounds and higher latitude feeding grounds, New Zealand blue whales rely on the same habitat in the STB throughout the year for multiple critical life history processes, including foraging, nursing and raising calves, and potentially breeding (Barlow et al., 2018(Barlow et al., , 2022. Therefore, the STB region is an ideal study system for yearround monitoring of multiple life history processes in a single location, uniquely enabling us to address challenging questions about the biology, ecology, and life history of a large marine vertebrate. The STB region is home to a productive coastal upwelling system, whereby a plume of cold water originating off the northwest of the South Island (Shirtcliffe et al., 1990) supports enhanced primary productivity (Chiswell et al., 2017) and dense aggregations of krill (Nyctiphanes australis; Bradford & Chapman, 1988;Bradford-Grieve et al., 1993) that sustain an important blue whale foraging ground . While blue whale habitat use and distribution patterns have been well described for the STB region during spring and summer months Torres et al., 2020), the annual and seasonal patterns of blue whale occurrence in the area remain undescribed. The year-round presence of upwelling (Chiswell et al., 2017) indicates that blue whales may be able to forage in the STB in all seasons.
The region was impacted by well-documented, severe regional marine heatwave conditions in the summers of 2016 and 2018, with anomalously high temperatures leading to reduced primary productivity and ecosystem-scale consequences (Chiswell & Sutton, 2020).
During the summer 2016 marine heatwave, krill aggregation density was dramatically reduced compared to more typical upwelling conditions, the distribution of krill aggregations shifted further offshore, and blue whale distribution was similarly shifted . However, the impact of these marine heatwaves on blue whale foraging and reproductive effort remains unknown.
Blue whales produce two main call types, each with different temporal occurrence patterns and hypothesized biological function.
Song is composed of a limited number of sounds that are repeated to form a recognizable pattern (McDonald et al., 2006). These songs are presumed to be signals produced exclusively by males, used to mediate social interactions and maintain associations, and likely play a role in reproduction (Lewis et al., 2018;McDonald et al., 2006;Oleson, Calambokidis, et al., 2007). Across the global oceans, the occurrence patterns of blue whale songs vary seasonally (Barlow et al., 2022;Burtenshaw et al., 2004;Leroy et al., 2018Leroy et al., , 2021Letsheleha et al., 2022;McCauley et al., 2018;Samaran et al., 2013;Stafford et al., 2001;Szesciorka et al., 2020). However, these songs are stereotyped and relatively stable, with characteristics that are distinct between acoustic populations McDonald et al., 2006). The second type of calls are downswept vocalizations known as D calls, which are produced by all sexes and age classes (Lewis et al., 2018;Oleson, Calambokidis, et al., 2007) and common across populations. The function of D calls is less precisely understood, but they are a hypothesized social call, often produced in conjunction with foraging behavior (Cade et al., 2021;Lewis et al., 2018;Oleson, Calambokidis, et al., 2007). Further support for D calls as a signal of foraging comes from their relationship with upwellingdriven productivity that supports krill availability, the primary prey of blue whales Cade et al., 2021;Szesciorka et al., 2020). Studying the spatial and temporal patterns in calling relative to environmental conditions can foster insight into blue whale ecology and important life history functions (Cade et al., 2021;Oestreich et al., 2022;Szesciorka et al., 2020).
In this study, we analyze the annual cycles and environmental correlates of both blue whale call types detected in continuous recordings in the STB region to examine call function and gain insight into underlying life history patterns (i.e., timing of foraging and reproductive effort). We anticipate different seasonal patterns in song and D calls, with song showing a clear seasonal cycle in intensity that is less correlated with environmental conditions and more temporally stable. We expect D calls to occur more often in the spring and summer during periods of enhanced productivity that support prey aggregations. If D calls indeed indicate foraging effort, we expect lower D call activity during marine heatwaves when upwelling is reduced. Finally, we predict a positive correlation between the number of D call detections and subsequent song intensity, under the premise that more foraging enables more reproductive effort. Therefore, we anticipate that song will be reduced following marine heatwave events when blue whales were not able to obtain adequate energetic stores from feeding. In summary, we examine acoustic signals of blue whales in the STB to describe year-round occurrence and behavioral patterns, assess correlations with variable ocean conditions, explore relationships between foraging and reproductive effort, and ultimately shed light on potential consequences of climate change for the viability of blue whale populations.

| Detection and classification of blue whale calls
Acoustic data were recorded using five marine autonomous recording units (MARUs; Calupca et al., 2000) deployed in the STB region at depths ranging from 66 to 278 m ( Figure 1). The hydrophones had a flat frequency response (±2 dB) in the 15-585 Hz frequency band, a total sensitivity of −145.5 dB, and recorded continuously at a 2 kHz sampling rate with a high-pass filter at 10 Hz and a low-pass filter at 800 Hz. Acoustic data were collected continuously from 23 January 2016 to 3 February 2018, with brief gaps in recording approximately every six months for data retrieval and hydrophone refurbishment  (Mellinger & Clark, 2000) were implemented to automatically identify putative instances of the New Zealand song and D calls in the recordings. Five templates were selected for song, and 13 templates were used for D calls to adequately capture the greater variability among calls. Putative call detections were compared against the template with the highest spectrogram correlation score, and detection thresholds of 0.75 for song and 0.80 for D calls were applied after testing a range of possible threshold values. Detector performance was evaluated by comparison to a subset of 26 days, representing one randomly selected day per month across the full recording period. This ground-truth data set was reviewed separately for each call type, and vocalizations were manually annotated.
For song, recordings were reviewed in consecutive 15-min spectrograms within the 0-50 Hz frequency bandwidth (3000 sample Hann window; 50% overlap). For D calls, recordings were reviewed in 5min spectrograms in the 0-150 Hz bandwidth (2048 sample Hann Window, 50% overlap). Automatic detections were considered true positives if they overlapped with a manually annotated call in the ground-truth data set by at least 50% in time and frequency. Three evaluation metrics were calculated using custom MATLAB scripts: precision represents the proportion of detections that were true positives, recall is the proportion of true calls that were detected, and the false alarm rate is computed as the number of false positives per hour (Mellinger et al., 2016).
After running the song detector on the full data set, detection events were manually reviewed by an analyst at an hourly resolution, ensuring that false positives were not included in the analysis.
In addition to the hourly presence or absence of the New Zealand song, a separate metric was computed to measure the intensity of song activity. The song intensity index is calculated as the ratio of the energy in the call frequency bandwidth (23-24 Hz) relative to the energy in selected background frequencies (11, 39 Hz), and a similar approach has been effectively used to describe behavior and phenology of blue whales from changes in this acoustic signal (Oestreich et al., 2020;Širović et al., 2009). Song intensity index was summarized on a daily scale for further analysis. To illustrate the annual cycle of song in the region, the mean song intensity index per day of the year was taken across all five hydrophone locations and the full recording period.
The spectrogram template correlation detector for D calls prioritized minimizing missed detections, which meant many false positives were retained by the detector and needed to be removed from the data set prior to further analysis. We therefore created an automatic classification algorithm to classify putative D call detections as either true D calls or false positives. Putative D call detections for the period between January 2016 and March 2017 were manually reviewed, and false positives were identified. Subsequently, a suite of spectral measurements (Appendix A) was extracted in Raven for all call detections identified by the detector and manually classified as true D calls or false positives and used as a training library to develop the automatic classification algorithm. A random forest model was fitted to the data using the "ranger" package in R (Wright & Ziegler, 2017), with the classification as either true D calls or false positives as the response variable and the spectral features extracted in Raven as the predictors. The number of trees used to grow the random forest model (ntree) was set to 100, and the number of variables sampled as candidates at each split (mtry) was set to 7. The training library was used to develop and evaluate whether the random forest model could reliably be used for automated classification by training the random forest model on a subset of 75% of the data and predicting to the withheld 25%. The misclassification rate, false-negative rate, true-positive rate, false-positive rate, and true-negative rate were calculated (Table A1); this process was repeated over 100 bootstrap iterations, and the mean and standard deviation were calculated for the evaluation metrics across all bootstrap runs. To assess whether using the automatic classifier would impact ecological inference from blue whale calling patterns, temporal occurrence patterns were compared between calls identified via the manual validation and automated classification methods for the period between January 2016 and March 2017. This comparison was evaluated visually and using Pearson's correlation coefficient.
Subsequently, the automatic classifier was run on the full data set, and the number of hours per day with D calls present and total D calls per day were computed for the entire recording period. To illustrate the annual cycle of D calls in the region, the mean D call detections per day of the year were taken across all five hydrophone locations and the full recording period.

| Call detection range estimation
Sound propagation models were generated to estimate the listening range of each hydrophone unit using the range-dependent acoustic model (RAM; Collins, 1993). RAM is well suited for calculating detection areas in low-frequency soundscapes and shallow water environments like the STB and does so by simulating call propagation from a whale to a hydrophone under the conditions at the time.
The model incorporates numerous soundscape features that impact the detection range for calling blue whales, including sound speed profile through the water column, depth of the hydrophone receiver, seafloor substrate characteristics, depth of the calling whale, the source level and frequency of the vocalizations of interest, and ambient noise.
Sound speed profiles were based on the World Ocean Atlas (WOA2009), bathymetry was extracted at a 1-arc minute resolution from the ETOPO1 data set (Amante & Eakins, 2009), and geo-acoustic parameters for fine sand (Bostock et al., 2019;Wentworth, 1922; grain size phi = 3) were used in the propagation model. The depth of the calling whale was set to 25 m. The source levels and dominant frequency band differ between song and D calls, and therefore can influence detection area. While source levels have not empirically been estimated for blue whales in New Zealand, we obtained these parameters from the literature for application in the models. For song, the source level of 179 dB re 1 μPa at 1 m estimated for pygmy blue whales from the Australian population was used as a proxy (Gavrilov et al., 2011), and the dominant frequency band used was 17-50 Hz. For D calls, the source level was set to 174 dB re 1 μPa at 1 m (Samaran et al., 2010), and the frequency band was 20-100 Hz. Hourly ambient noise levels were considered the 1st percentile levels of the dominant frequency band for each call type.
Calls were simulated across a grid of points at 1-arc minute resolution (~2.25 km 2 ) at varying distances surrounding each hydrophone, and detection area was estimated given the ambient noise recorded at the hydrophone. To summarize the modeled detection distances, we calculated the mean daily detection area for each call type at each hydrophone across the entire recording period.

| Environmental data
Environmental variables were chosen based on prior investigations into the oceanographic properties of regional upwelling processes and functional drivers of blue whale foraging opportunities in the STB . Daily SST and SST anomaly measurements were acquired from the multiscale ultrahigh resolution (MUR) satellite at a 0.01-degree spatial resolution and daily temporal scale. Net primary productivity of carbon (NPP) was obtained from the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite at a 0.0125-degree spatial resolution and 8-day composite to minimize the impact of cloud cover. All satellite data were accessed via the ERDDAP server (https://coast watch.pfeg.noaa.gov/erddap) using the R package "rerddapXtracto" (Mendelssohn, 2019). For each day with acoustic recordings, SST, SST anomaly, and NPP were extracted within the calculated detection radius surrounding each hydrophone, separately for song and D calls. The mean values were then computed within the detection area to summarize the environmental conditions within the possible range of detection for vocalizing blue whales over the full data set. NPP measurements were log-transformed for all subsequent analyses to minimize skew.

| Whaling data
Whaling records were accessed from the International Whaling Commission individual catch database (Allison, 2020). Extracted data (where available) included lengths of fetuses from pregnant blue whales that were killed, along with the date and location of the catch. Based on length frequencies at different positions, data were separated between Antarctic blue whales (B. m. intermedia) and pygmy blue whales (B. m. brevicauda). Catches north of the following latitudes and south of the equator were assumed to be pygmy blue whales: 20-30° E (46°S), 30-70°E (52°S), 70-80°E (53°S), and 80°E-180° (52°S). All catches longer than 24.2 m were assigned to Antarctic blue whales and catches south of this pygmy region but specifically recorded as "pygmy" were assigned to pygmy blue whales. Catches in the Atlantic Ocean (west of 20°E) were classified as Antarctic, while those off the west coast of South America were excluded as they belong to the Chilean blue whale population ; Figure A6).
For Antarctic blue whales, a wide variety of recording methods were used by different nations, and most lengths were recorded in feet and rounded to the nearest foot. For pygmy blue whales, nearly all pygmy blue whale catches were taken either by Japan (1959/60-1963/64) or the Soviet Union (1962/63-1971/72), and lengths were recorded in metric units, mostly to the nearest cm. Note that for the Soviet data, information has been reconstructed from original logbooks for nearly all expeditions and replaces the original falsified records that were submitted to the International Whaling Commission from 1949/50 (Yablokov, 1994;Zemsky et al., 1995).
The pygmy data in the New Zealand region (145°E-180°, 32-50°S) were assessed to determine whether these blue whales fol- The fetal length measurements throughout the year were then used to infer the timing of births, based on the time at which fetal length measurements reach their maximum. Subsequently, timing of births was used to infer timing of conception, based on prior analyses of the catch records that concluded a gestation period of about 11 months for Antarctic blue whales (Laws, 1959;Mackintosh & Wheeler, 1929).

| Statistical analyses
We constructed models to examine the functional relationships between calling activity and environmental correlates, for both song and D calls across the full recording period. Boosted regression trees (BRT) are a machine learning multivariate modeling approach that combines decision tree methods with a boosting algorithm that iteratively optimizes model performance by combining a large number of decision trees (Elith et al., 2008). This approach is well suited for predicting distribution patterns, describing nonlinear relationships and can handle complex interactions between predictors. Furthermore, BRTs minimize the effect of temporal autocorrelation by randomly selecting a proportion of the data (called the bag fraction) at each iteration, and data are further split during cross-validation to evaluate model performance. Therefore, the probability of a contiguous timeseries being maintained is quite small. Models were fitted separately for each of the five hydrophones, for two reasons: first, the detection areas between hydrophones occasionally overlap which could lead to overrepresentation of call activity if combined, and second, a main objective was to examine how relationships between calling and environmental drivers varies between locations, therefore combining all hydrophones into a single model would mask any spatial heterogeneity in the functional relationships among hydrophone locations.
All BRT models were implemented using the "gbm" (Greenwell et al., 2019) and "dismo" (Hijmans et al., 2017) packages in R. For song, the response variable is the daily song intensity index, fit with a Gaussian distribution. For D calls, the response variable is number of D calls per day, fit with a Poisson distribution, which is suitable for count data. Full models were first fit for each call type across all seasons at each hydrophone, and subsequently models were fit for within-season peaks for each call type, at each hydrophone. Dynamic predictor variables included SST, SST anomaly, and log(NPP). Additionally, month was included to examine seasonal effects, and detection area was included to account for variation in the distance over which calls could be detected. All models were fit with a learning rate of 0.005, a bag fraction of 0.75, and a tree complexity of 2. Performance was assessed with two metrics, the cross validated percent deviance explained (cv.dev) and the cross-validated correlation (cv.cor). The influence of predictor variables on calling activity was assessed in two ways, through the percent contribution in the BRT model and using partial dependence plots to visualize each functional relationship between predictor and response while all other predictors are held at their mean value, implemented using the "pdp" package in R (Greenwell, 2017).
The summers of 2016 and 2018 were characterized by welldocumented marine heatwaves leading to dramatically increased temperatures and reduced primary productivity throughout the STB region, whereas summer 2017 consisted of more typical upwelling conditions Chiswell & Sutton, 2020). Therefore, key upwelling-related metrics and D calls were compared for just the summer months (January-February) between the three years.
Linear mixed models were used to estimate the effect of year while accounting for differences between hydrophone locations, with separate models for each of the three metrics of interest as the response variable: SST, NPP, and D call detections. Subsequently, post hoc tests with a Tukey correction were run to examine pairwise comparisons between years for each metric.
Finally, we fit a linear mixed model to examine the relationship between D call activity during the summer and song intensity in the subsequent fall, accounting for differences between hydrophone locations as a random factor. Calling was averaged by hydrophone, for each season in each year. This analysis covered two years, the marine heatwave year 2016 and the more typical upwelling year in 2017.

| Call detection and classification
Spectrogram correlation detectors implemented for automatic call detection showed higher performance for the more stereotyped  (Table A1). Cross-correlation analysis revealed a highly significant relationship between the number of D calls from the manual validation and automatic classifier (Pearson's correlation coefficient = .991, p < 2.2 × 10 −16 ; Figure A1). The temporal occurrence patterns visualized using the D calls identified by the automatic classifier were nearly indistinguishable from the patterns revealed by the manually validated calls (Figures A2 and A3).
These comparisons indicate that the fully automated methods for detection and classification of D calls provide a robust approach to examine blue whale occurrence patterns and ecological drivers of calling behavior (Figure 2). The average annual cycle of blue whale vocalizations revealed three seasonal peaks in calling activity, representing both concordance and divergence between the two call types (Figure 3). Song has a single annual peak in the fall, between April and June. D calls have a strong spring peak, between October and January.

| Spatial and temporal calling patterns
Subsequently, D calls have a second peak in fall that overlaps with the fall song peak, though in a narrower temporal window between F I G U R E 3 Annual cycle of calling activity. Average annual cycle in the song intensity index (dark blue) and D calls per day of the year, computed across all hydrophone locations and the entire recording period.

F I G U R E 2
Occurrence patterns of song and D calls. Left panels: number of hours per day with song detected (light blue bars) and daily song intensity index (dark blue lines) for each hydrophone (MARU 1-5) over the entire recording period. Right panels: number of hours per day with D calls detected (light green bars) and number of D call detections per day (dark green lines) for each hydrophone over the entire recording period. The hydrophone is listed at the top of each panel, corresponding to map locations in Figure 1. Periods with light gray shading indicate gaps in recording due to hydrophone refurbishment.
April and May. These periods of elevated calling activity were examined in the within-peak models.

| Environmental correlates of calling
Boosted regression tree (BRT) models performed well overall, with some variability among models fit to different call types, seasons, and hydrophones (Table 1). The models with the greatest performance measured by cross-validated percent deviance explained (cv. dev) and cross-validated correlation (cv.cor) were those fit to the full recording period, for song and D calls. Within-peak models were fitted for each period with the highest calling activity of each call type and showed more nuanced differences among environmental drivers at different times of the year (Table 2, Figure 4). Across all within-peak models for each hydrophone location, model performance dropped with the removal of month as a predictor (Table 1), but the patterns revealed by functional relationships within the different peaks yield insight into the importance of different environmental correlates seasonally.
Song intensity showed the strongest relationship with month, which contributed 59%-77% of the explained deviance in the full models ( Table 2). As was revealed by the seasonal calling plots (Figure 3), the functional relationship between song intensity and month in the BRT models shows a peak in song intensity between April-June ( Figure A4). In contrast, all environmental predictors had relatively low contributions in the full models for song (Table 2).
Within the fall peak, song intensity corresponds to an optimal SST range of 16-18°C, lower SST anomalies, and shows an inconsistent relationship with log(NPP) between the hydrophones (Figure 4).
There was an increase in song intensity with increased song detection area across all hydrophones.
Environmental correlates had stronger contribution to the full models for D calls than song (Table 2). Month still had a strong influence in the full models, with D call detections at their highest in the spring and fall. The number of D call detections increased with the detection area for all hydrophones (Table 2; Figure A5). Model performance was higher for D calls in spring than in fall, and functional relationships differed between the two seasons (Table 1, Figure 4).
In spring, increased D calls are associated with lower SST, positive SST anomalies, and increase with increasing log(NPP) before leveling off or even decreasing at very high values. In the fall, increased D calls are associated with higher log(NPP) values and lower SST anomalies. Furthermore, functional relationships differed more strongly between hydrophone locations in fall than in spring. For example, D calls in the fall showed a negative relationship with increasing SST at the offshore hydrophones (MARU1 and MARU5), whereas MARU2, MARU3, and MARU4 showed an optimal SST range of 16-18°C, much like what was observed for the fall peak in song. Overall, the functional response curves for D calls in the fall more closely resemble the functional relationships for song in fall than they do D calls in spring (Figure 4).

| Call function and life history patterns
Song intensity was at its highest between April July, with the peak taking place in May-June ( Figure 5). Examination of whaling catch records from the New Zealand region revealed alignment with the overall pattern of fetal length measurements for pygmy TA B L E 1 Boosted regression tree model performance. Note: Evaluation of the boosted regression tree models fitted for each call type and hydrophone location. All models were fit with a learning rate of 0.005, a bag fraction of 0.75, and a tree complexity of 2. For song, the response variable is the daily song intensity index, fit with a Gaussian distribution. For D calls, the response variable is number of D calls per day, fit with a Poisson distribution, which is suitable for count data. Full models were first fit for each call type across all seasons at each hydrophone, and subsequently models were fit for within-season peaks for each call type, at each hydrophone. Performance is assessed with two metrics, the cross validated percent deviance explained (cv. dev) and the cross-validated correlation (cv.cor).
and Antarctic blue whales caught in the Southern Hemisphere ( Figure 5). Calving presumably takes place around, or shortly after, the time when fetal lengths are at their maximum, indicating that births occur in April-May. Based on this timing, we infer that mating likely happens in May-June (assuming an 11-month gestation [Laws, 1959;Mackintosh & Wheeler, 1929]), which coincides with the peak song intensity. Therefore, the mean peak in recorded song intensity aligns closely with the estimated timing of conception.
In 2016 and 2018 when documented marine heatwaves took place (Chiswell & Sutton, 2020), SST was higher, log(NPP) was lower, and mean daily D call detections were lower within the detection range surrounding each of the hydrophone locations ( Figure 6).
Linear mixed models accounting for differences among hydrophone locations revealed significant interannual differences for SST

| DISCUSS ION
Our analyses of spatial and temporal calling patterns and environmental correlates from two years of acoustic monitoring  (Lewis et al., 2018;Oleson, Calambokidis, et al., 2007). Moreover, we link the peak song intensity to the time of mating inferred from fetal growth patterns ( Figure 5), demonstrating that New Zealand blue whales likely use the STB as a breeding ground and song intensity reflects their reproductive cycle. D calls were reduced during marine heatwaves when upwelling conditions (i.e., low SST, high NPP) were reduced or absent (Figures 4 and 6). These periods also corresponded with a documented reduction in both the number of krill aggregations and krill aggregation density in the region, indicating poor blue whale foraging conditions . Interestingly, D calls during the summertime are correlated with song intensity during the following fall reproductive peak (Figure 7), with lower D call rates followed by lower song intensity. Thus, our findings suggest a negative impact of marine heatwave conditions on multiple life history functions at the population scale (foraging and breeding effort) and demonstrate how climate change may impact blue whale populations through reduced feeding opportunities, with potential consequences for reproduction and population viability.

| Ecological patterns of calling
Using prior research on physical and biological drivers of blue whale foraging opportunities in the region Torres et al., 2020), models examining calling activity as a function of season and environmental predictors performed well (Table 1)  Notably, environmental correlates of D calls in the fall bear more similarities to song during the fall than to D calls in the spring ( Figure 4). For example, in the springtime, D calls show a strong negative relationship with SST. In the fall, both song and D calls display a mid-range SST optimum between 16 and 18°C (Figure 4).
This finding may be a function of different available environmental conditions during these periods, with different drivers of foraging conditions than indicated by prior studies conducted in the spring and summer , or a different function of D calls between the spring and fall peaks.
The double-peak pattern of D calls (Figure 3) is also observed in the California Current system (Lewis et al., 2018;Oleson, Wiggins, & Hildebrand, 2007) and in the Corcovado Gulf of Chile (Buchan et al., 2021). In all three regions, it seems the fall peak in D calls could represent late-season foraging associated with a fall bloom in productivity, a transition into more social behaviors relating to reproductive activity, or both.
While the environmental correlates of the spring D call peak are strongly indicative of upwelling and conditions suitable for blue whale foraging , the driving patterns behind the fall calling peaks are less evident from environmental correlates alone. Song is hypothesized to serve a reproductive function (McDonald et al., 2006;Oleson, Calambokidis, et al., 2007), and while mating has never been directly observed for blue whales, the paradigm for baleen whales is a temporally defined breeding season each year. Therefore, the phenology of the fall peak in song may be driven more predominantly by the breeding season defined by their life history timing rather than environmental cues.
While the hypothesized foraging function of D calls aligns closely with our observations during the spring and summer, the fall peak in D calls aligns more closely with song, both in terms of timing and environmental correlates (Figures 3 and 4). Although song is only produced by males, other demographic groups engage in social activities, and D calls are produced by all sexes and age classes (Lewis et al., 2018;Oleson, Calambokidis, et al., 2007). The coincident fall peak of both call types may represent a period of social behavior across demographic groups during the likely breeding season for this blue whale population. This potential use of D calls as social calls more broadly is corroborated by increased D call detections coincident with blue whale super-aggregations (Cade et al., 2021) and triads engaged in social behaviors potentially related to reproductive activity (Schall et al., 2019).

F I G U R E 5
Annual song intensity and the breeding cycle. Top panel: average yearly cycle in song intensity index, computed across the five hydrophone locations and the entire recording period; dark blue line represents a loess smoothed fit. Bottom panel: fetal length measurements from whaling catch records for Antarctic blue whales (gray, measurements rounded to the nearest foot), pygmy blue whales in the southern hemisphere (blue, measurements rounded to the nearest centimeter). Measurements from blue whales caught within the established range of the New Zealand population are denoted by the dark red triangles. Calving presumably takes place around or shortly after fetal lengths are at their maximum (April-May), which implies that mating likely occurs around May-June, coincident with the peak song intensity. Note: Many small fetuses were missed in the Antarctic blue whale data, whereas sampling was more thorough for pygmy blue whale data.
By including detection area in the models, we accounted for the potential of increased call detections with larger detection ranges while evaluating the ecological impacts of environmental predictor variables. Model results revealed that the role of environmental features differs among hydrophone locations in the STB. The presence and strength of the upwelling plume presents a distinct temperature signal closer to the source, where cold water is brought to the surface near the MARU5 location Shirtcliffe et al., 1990). Further downstream along the plume's trajectory, there is a stronger signal in surface productivity (Chiswell et al., 2017).
This pattern is evident in the model results for D calls in spring, illustrated by the high contribution of SST and low contribution of NPP for MARU5 compared to low contribution of SST and high contribution of NPP for MARU1 (Table 2) New Zealand (Barlow et al., 2018), and acoustic recordings indicate occasional presence of the New Zealand song off the coast of eastern Australia and as far north as Tonga (Balcazar et al., 2015;McCauley et al., 2018). However, given the near-constant presence of blue whale calls in the STB (Figure 2), the differences in occurrence patterns between song and D calls, and near-equal sex ratios of blue whales throughout their range , changes in calling activity likely do reflect underlying patterns in ecology and life history rather than solely changes in distribution or density.

| Reproductive cycle inferred from song and whaling records
While seasonal cycles in song activity are evident for blue whale populations around the world (Barlow et al., 2022;Buchan et al., 2020;Lewis et al., 2018;McCauley et al., 2018;Samaran et al., 2013;Stafford et al., 2001Stafford et al., , 2004, it is often challenging to disentangle changes in song detection from movement of blue whales into and out of an area. In the STB, blue whales are present year round (Barlow et al., 2022;Warren et al., 2021), minimizing the confounding variable of animals moving out of the area (Figures 2 and 3). The whales forage Torres et al., 2020) in the same area where they sing, indicating they use the same place for multiple critical life history functions (Barlow et al., 2022). While the reproductive function of song has been postulated based on behavioral observations over short time scales (Lewis et al., 2018; F I G U R E 6 Environmental conditions and D calls in summer months. Mean sea surface temperature (top), net primary productivity (middle), and daily D call detections (bottom) for the period of 1 January through 28 February in each of the three years with recording coverage, with each of the five hydrophone recording locations indicated by colors. The number of recording days for each hydrophone in each year are indicated in the bottom panel. January and February were characterized by regional marine heatwaves in 2016 and 2018, while 2017 exhibited more typical summer upwelling conditions. Pairwise comparisons resulting from a linear mixed model for each variable by year (accounting for differences among hydrophone locations) are indicated above each set of boxplots. In all cases, the linear mixed models were significant, with the two marine heatwave years characterized by comparable SST, log(NPP), and D call values that were significantly different from those measured in the more typical upwelling year. Calambokidis, et al., 2007), we make the link via the timing of reproduction inferred from biological data on the timing of conception, fetal growth, and births over a broad spatial and temporal scale.
Therefore, the cycle of song intensity in this region is likely indicative of the annual breeding cycle for this blue whale population.
Temporal patterns of blue whale fetal length measurements from whaling records (Allison, 2020) Figure A6).

| Potential marine heatwave impacts on foraging and reproduction
Reduced upwelling during the 2016 and 2018 marine heatwaves (Chiswell & Sutton, 2020;Oliver et al., 2017;Sutton & Bowen, 2019) coincided with a significant reduction in D calls across all five hydrophone locations ( Figure 6). The decrease in D calls during these periods further emphasizes their role as a foraging-related call in the spring and summer, indicating decreased feeding during anomalously warm periods. The marine heatwave conditions and impacts on blue whale foraging may serve as a harbinger of what is to come, as ocean temperatures are expected to warm and marine heatwaves are expected to increase in both frequency and intensity with climate change (Frölicher et al., 2018;Holbrook et al., 2019;Oliver et al., 2018), including in the STB region (Behrens et al., 2022).
Reduced foraging opportunities may have ramifications for population viability if blue whales are not able to obtain sufficient energetic stores to support reproduction (Pirotta et al., 2019). Furthermore, there are potential carryover effects into subsequent years if enough energy is not obtained in a foraging season (Soledade Lemos et al., 2020). This consequence of reduced foraging was indicated by the significant relationship between summertime D call activity and song intensity in the subsequent breeding season. The timing of the fall peak in song did not differ between 2016 and 2017, but the mean song intensity index was lower during the 2016 fall peak ( Figure A7). The reduction in foraging during the 2016 marine heatwave may have led to reduced reproductive activity or animals moving away from this region in the peak breeding period (Figure 7). The clear exception in this pattern is the MARU5 hydrophone, which is the furthest southwest recording location. Visual surveys during the 2016 heatwave found that blue whales were absent from the central STB; sightings were concentrated offshore, where concurrent prey mapping identified the only location with high krill density in summer 2016 . During this heatwave, the western region of the STB may have served as a refuge for krill, and blue whales may have altered their distribution in response to shifting prey availability, as reflected in both the sightings data  and higher than expected D call detections.
The increased song intensity at the furthest offshore hydrophone in 2016 does not follow the expected pattern (Figure 7), perhaps due to concentration of blue whales further offshore following the summer 2016 marine heatwave .
The impacts of marine heatwaves are wide reaching. The 2014-2016 marine heatwave in the California Current led to widespread changes in the biological structure and composition of pelagic and coastal ecosystems (Cavole et al., 2016), including anomalously low primary productivity (Kahru et al., 2018), reductions in krill abundance by up to 95% (Lavaniegos et al., 2019), shifts in the timing and location of spawning of pelagic fish stocks (Auth et al., 2018), and compression of humpback whale habitat leading to higher recorded entanglements in fishing gear (Santora et al., 2020). The 2015-2016 Tasman Sea heatwave reduced primary productivity (Chiswell & Sutton, 2020), altered zooplankton community composition (Evans et al., 2020), severely impacted the aquaculture industry (Oliver et al., 2017), and shifted blue whale distribution in the STB region during the summer foraging season . While these impacts of marine heatwaves have been documented across ecosystems and regions, the link to reproductive success in baleen whales has remained a gap in knowledge. Environmental fluctuations and reduced foraging are known to impact population health in multiple baleen whale species, including blue (Pirotta et al., 2019), gray (Lemos et al., 2021;Soledade Lemos et al., 2020), and right (Gavrilchuk et al., 2021;Seyboth et al., 2016) whales. While we could not investigate population vital rates with the data at-hand, the information on blue whale call function gained through this study enabled us to postulate a connection between F I G U R E 7 Relationship between summer D call detections and fall song intensity. Relationship between the mean number of D call detections between 1 January through 28 February and the mean song intensity index in the subsequent fall peak between 1 April and 31 June. Points are symbolized by year and hydrophone recording location.
reduced foraging during marine heatwaves and subsequent reduced reproductive activity by monitoring acoustic signals. Based on regionally resolved model projections, marine heatwaves in the Tasman Sea are expected to increase in intensity and duration, including potentially annually persistent marine heatwave conditions by the end of the century under some greenhouse gas emissions scenarios (Behrens et al., 2022). Therefore, the implications of our findings will become increasingly pertinent for the future of the New Zealand blue whale population.
Our analyses of blue whale vocalizations shed light on call function, associations with variable environmental conditions including reduced foraging during marine heatwaves, and a potential relationship between foraging and reproductive effort. Looking forward, extension of this work over longer time scales to assess the influence of environmental variability on blue whale calling, behavior, and reproduction is prudent, particularly considering the anticipated impacts of climate change and the likelihood of nonanalogous conditions in the future (Frölicher et al., 2018). Formal analysis (supporting); investigation (supporting); methodology (equal); writing -review and editing (supporting). Leigh G.

ACK N OWLED G M ENTS
We thank the New Zealand Department of Conservation for their support of this research effort. Acoustic data collection and analysis were accomplished thanks to many experts from the K. Lisa

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
The processed data sets generated for this study and relevant analy- Note: Each bootstrap iteration trained the model on a randomly selected 75% of the data, and predicted to the withheld 25%, and this process was repeated over 100 runs.
F I G U R E A 1 Correlation between the daily number of D calls identified from the manual validation vs. D calls identified by the automatic classifier, for the period 23 January 2016-31 March 2017.

F I G U R E A 2
Spatiotemporal detection patterns of blue whale D call occurrence and intensity visualized using D call detections that were manually validated. Hours with D calls detected per day (light green bars) and the daily number of D call detections (dark green lines) are shown between 23 January 2016 and 31 March 2016, at each hydrophone recording location.

F I G U R E A 3
Spatiotemporal detection patterns of blue whale D call occurrence and intensity visualized using D call detections that were automatically classified using the random forest model. Hours with D calls detected per day (light green bars) and the daily number of D call detections (dark green lines) are shown between 23 January 2016 and 31 March 2016, at each hydrophone recording location. Note no noticeable difference in occurrence patterns from Figure A2.

F I G U R E A 6
Whaling catches of Antarctic blue whales (blue) and pygmy blue whales (red) in the southern hemisphere, with assumed boundaries in black between each. Acronyms denote each of the currently assumed blue whale populations: central Indian ocean (CIO, Sri Lanka), south-west Indian Ocean (SWIO, Madagascar), south-east Indian Ocean (SEIO, Australia/Indonesia), south-west Pacific Ocean (SWPO, New Zealand). Key land stations in and near the pygmy blue whale region are labeled.

F I G U R E A 7
Average annual cycle in song intensity index across all five hydrophone locations by year. Dark solid lines represent a loess smoothed fit. Note the alignment of the fall peak in song intensity in both years, but the lower average song intensity in 2016 than in 2017.